Plan Optimization to Bilingual Dictionary Induction for Low-resource Language Families
نویسندگان
چکیده
Creating bilingual dictionary is the first crucial step in enriching low-resource languages. Especially for closely related ones, it has been shown that constraint-based approach useful inducing lexicons from two dictionaries via pivot language. However, if there are no available machine-readable as input, we need to consider manual creation by native speakers. To reach a goal of comprehensively create multiple dictionaries, even already have several existing still difficult determine execution order reducing total cost. Plan optimization composing with consideration methods and their costs. We formalize plan creating utilizing Markov Decision Process (MDP) get more accurate estimation most feasible optimal least cost before fully implementing lexicon induction. model prior beta distribution induction precision language similarity polysemy topology $\beta$?> parameters. It further used function state transition probability. estimated all investment plans baseline evaluating proposed MDP-based an evaluation metric. After posterior batch experiments construct second experiments, result shows 61.5% reduction compared 39.4% MDP plan. The proposal outperformed on
منابع مشابه
Bilingual Dictionary Induction as an Optimization Problem
Bilingual dictionaries are vital in many areas of natural language processing, but such resources are rarely available for lower-density language pairs, especially for those that are closely related. Pivot-based induction consists of using a third language to bridge a language pair. As an approach to create new dictionaries, it can generate wrong translations due to polysemy and ambiguous words...
متن کاملBilingual Lexicon Induction for Low-resource Languages
Statistical machine translation relies on the availability of substantial amounts of human translated texts. Such bilingual resources are available for relatively few language pairs, which presents obstacles to applying current statistical translation models to low-resource languages. In this work, we induce bilingual dictionaries from more plentiful monolingual corpora using a diverse set of c...
متن کاملBilingual Sign Language Dictionary
The Spanish Sign Language Dictionary (DILSE) is one of the first truly bilingual (Spanish Sign Language-Spanish) electronic dictionaries for the deaf community. The properties of this format are perfectly matched to a visual language such as sign language, which uses space as a means of expression. Additionally, two-way searches for word entries are possible from either Spanish or signs. The si...
متن کاملBilingual dictionary generation for low-resourced language pairs
Bilingual dictionaries are vital resources in many areas of natural language processing. Numerous methods of machine translation require bilingual dictionaries with large coverage, but less-frequent language pairs rarely have any digitalized resources. Since the need for these resources is increasing, but the human resources are scarce for less represented languages, efficient automatized metho...
متن کاملKnowledge Distillation for Bilingual Dictionary Induction
Leveraging zero-shot learning to learn mapping functions between vector spaces of different languages is a promising approach to bilingual dictionary induction. However, methods using this approach have not yet achieved high accuracy on the task. In this paper, we propose a bridging approach, where our main contribution is a knowledge distillation training objective. As teachers, rich resource ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2021
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3448215